MENU

Here, I want to discuss the main probability distribution (based on my humble knowledge). Probability is the area that I am so fascinated with because there are many applications in several science topics. The principal probability distributions necessary to understand the whole process regarding inference and applying statistical models are Bernoulli, Binomial, Negative-Binomial, Poisson, Normal, and Gamma. Of course, there are many other essential distributions that I am not to discourse here. I will try to explain the support and parameters beyond the idea behind each one.

Bernoulli distribution

The first one is the most famous distribution, is the Bernoulli distribution. Let X a binary random variable with probability density function (PDF) f_{x} . Then, X \sim Ber(p) has PDF

f(x) = \mathrm{P}(X = x) = p^{x} (1-p)^{1-x}

where the support is X \in \{0, 1\} and parametric space is p \in (0, 1). The expected value (mathematical expectation) is \mathrm{E}(X) = p and variance is \mathrm{Var}(X) = p(1 - p).

I will not discuss moments in statistics here where the first moment is mathematical expectations and the second is related to variance. Nevertheless, Wikipedia is a good site where you might start to study more about this topic. I love this concept because everything concerning statistical models is linked to a mean, mainly in generalized linear models (MLG). But it is a topic to see forward.

Bellow, there is a code about fifteen realizations from Bernoulli distribution. You can see that there is a chart, where the x-axis is X = 1 and X = 0, and y-axis is \mathrm{P}(X = 1) and \mathrm{P}(X = 0), respectively. And other propriety that we need to have in mind is \mathrm{P}(X = 1) + \mathrm{P}(X = 0) = 1.

set.seed(123)
value <- seq(1e-06, 0.999999, by = 0.001)
p <- sample(value, size = 15, replace = TRUE)
q <- 1 - p

data0 <- cbind(`X = 1` = p, `X = 0` = q)

barplot(data0, beside = TRUE, main = "Bernoulli distribution",
    xlab = "Realization of the variable",
    ylab = "Probability", col = rainbow(15))

The Bernoulli distribution has a huge spotlight in many areas, mainly because of its applications. For whatever response variable you have whose response is good/bad or two options, the model logistic regression will be the model appropriated to study.

Binomial distribution

The Binomial distribution is essential primarily due to its application in the experimental area of health, agronomy, or other sciences. Let X a binary random variable with probability density function (PDF) f_{x}. Then, X \sim Bin(n, p) has PDF

f(x) = \mathrm{P}(X = x) = \binom{n}{x} p^{x} (1-p)^{n-x}

where the support is X \in \{0, 1, \dots, n\} – number of successes and parametric space is p \in (0, 1) success probability for each trial and n \in \{0, 1, \dots\} - number of trials. The expected value is \mathrm{E}(X) = np and variance is \mathrm{Var}(X) = np(1 - p).

For this last one, n, I would prefer to treat it as fix value than a parameter. It happens because you will have always been with this value previously. And the concept of the parameter is to estimate from the sample and not to have it before. It might have the same idea or be called a hyperparameter, such as machine learning techniques.

The graph below shows us how different p could affect the density curve of Binomial distribution.

par(mfrow = c(2, 2))

n <- 30
success <- seq(0, n)
prob <- c(0.2, 0.4, 0.6, 0.8)

for (i in seq(1, length(prob))) {

    set.seed(123)

    dens <- dbinom(success, size = n,
        prob = prob[i])
    name <- paste0("Binomial Distribution (n=",
        n, ", p=", prob[i], ")")

    plot(success, dens, type = "h",
        main = name, ylab = "Probability",
        xlab = "Successes", lwd = 3)
}

The model that comes from this one is the dose‐effect model used in an experiment, and this idea comes from GLM as well. I will explain this model in the future-forward and link it here as it is ready.

Poisson distribution

When you have to analyze data that the response variable is a positive discrete variable, the Poisson distribution is the best distribution to begin your study. Let X a discrete random variable with probability density function (PDF) f_{x} . Then, X \sim Pois(\lambda) has PDF

f(x) = \mathrm{P}(X = x) = \frac{\lambda^{x} e^{-\lambda}}{x!}

where the support is X \in \{0, 1, 2, \dots\} – number of successes and parametric space is \lambda \in (0, + \infty) rate. The expected value and variance is \mathrm{E}(X) = \mathrm{Var}(X) = \lambda. Below is a code to generate data from Poisson distribution in the R program.

REFERENCES

Agresti, Alan. 2015. Foundations of Linear and Generalized Linear Models. John Wiley & Sons.
DeGroot, Morris H, and Mark J Schervish. 2012. Probability and Statistics. Pearson Education.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Rigby, Robert A, Mikis D Stasinopoulos, Gillian Z Heller, and Fernanda De Bastiani. 2019. Distributions for Modeling Location, Scale, and Shape: Using GAMLSS in r. CRC press.
Create a front page